Weight generation in machine learning

ABSTRACT

Technologies are generally described for systems, devices and methods relating to a machine learning environment. In some examples, a processor may identify a training distribution of a training data. The processor may identify information about a test distribution of a test data. The processor may identify a coordinate of the training data and the test data. The processor may determine, for the coordinate, differences between the test distribution and the training distribution. The processor may determine weights based on the differences. The weights may be adapted to cause the training distribution to conform to the test distribution when the weights are applied to the training distribution.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation application under 35 U.S.C. §120 of U.S. patent application Ser. No. 14/451,899, filed on Aug. 5,2014, which is a non-provisional application that claims priority under35 U.S.C. § 119(e) to U.S. Provisional Patent Application No.62/015,200, filed on Jun. 20, 2014 and U.S. Provisional PatentApplication No. 61/907,499, filed on Nov. 22, 2013. U.S. ProvisionalPatent Application No. 62/015,200, U.S. Provisional Patent ApplicationNo. 61/907,499, and U.S. patent application Ser. No. 14/451,899 areherein incorporated by reference in their entireties.

This application is related to U.S. patent application Ser. No.15/261,390, filed on Sep. 9, 2016, entitled “WEIGHT BENEFIT EVALUATORFOR TRAINING DATA”, U.S. patent application Ser. No. 14/451,870, filedon Aug. 5, 2014, entitled “GENERATION OF WEIGHTS IN MACHINE LEARNING”,and U.S. patent application Ser. No. 14/451,935, filed on Aug. 5, 2014,entitled “ALTERNATIVE TRAINING DISTRIBUTION DATA IN MACHINE LEARNING”.

BACKGROUND

Unless otherwise indicated herein, the materials described in thissection are not prior art to the claims in this application and are notadmitted to be prior art by inclusion in this section.

Machine learning involves systems that may be trained with data in orderto learn from the data and make generalizations based on the data. Atrained machine learning system may take inputs and predict outputs orlabels. In some examples, machine learning techniques may predictoutputs by solving a classification or regression problem. Machinelearning systems may be effective to classify data, makerecommendations, and/or predict various outcomes based on training alearning algorithm with data.

SUMMARY

In some examples, methods in a machine learning environments aregenerally described. In various examples, the methods may includeidentifying, by a processor, a training distribution of a training data.In still other examples, the methods may also include identifying, bythe processor, information about a test distribution of a test data. Inother examples, the methods may also include identifying, by theprocessor, a coordinate of the training data and the test data. In yetother examples, the methods may also include determining, by theprocessor, for the coordinate, differences between the test distributionand the training distribution. In other examples, the methods may alsoinclude determining, by the processor, weights based on the differences.In some examples, the weights may be adapted to cause the trainingdistribution to conform to the test distribution when the weights areapplied to the training distribution.

In some examples, methods to determine a weight for training data aregenerally described. The methods may include, by a processor,identifying first points of the training data. In various otherexamples, the methods may also include, by the processor, identifyinginformation about a test data. The test data may include second points.In other examples, the methods may also include, by the processor,identifying a coordinate of the first and second points. In someexamples, the coordinate may include a range of values in a coordinatespace. In various other examples, the methods may also include, by theprocessor, dividing the range of values in the coordinate space intobins. In some examples, respective bins may define subsets of the rangeof values. In other examples, the methods may also include, by theprocessor, determining a first frequency. In examples, the firstfrequency may relate to a first percentage of the first points beinglocated within a particular bin. In various examples, the methods mayalso include, by the processor, determining a second frequency. Thesecond frequency may relate to a second percentage of the second pointsbeing located within the particular bin. In some other examples, themethods may further include, by the processor, comparing the firstfrequency and the second frequency. In further examples, the methods mayinclude, by the processor determining the weight for the training data,based at least in part on the comparison of the first and secondfrequencies.

In some other examples, computing devices are generally described. Insome examples, the computing devices may include a processor and amemory configured to be in communication with the processor. In otherexamples, the memory may be effective to store training data. In variousexamples, the training data may include first points. In some examples,the memory may be effective to store test data. The test data mayinclude second points. In some other examples, the processor may beeffective to identify a coordinate of the first and second points. Invarious examples, the coordinate may include a range of values in acoordinate space. In some further examples, the processor may beeffective to divide the range of values in the coordinate space intobins. In some examples, respective bins may define subsets of the rangeof values. In some examples, the processor may be effective to determinea first frequency. The first frequency may relate to a first percentageof the first points which are located within a particular bin. Infurther examples, the processor may be effective to determine a secondfrequency. In some cases, the second frequency may relate to a secondpercentage of the second points which are located within the particularbin. In various other examples, the processor may be further effectiveto compare the first frequency and the second frequency. In otherexamples, the processor may be further effective to determine a weightfor the training data, based at least in part on the comparison of thefirst and second frequencies. In some examples, the memory may befurther effective to the memory effective to store the weight.

The foregoing summary is illustrative only and is not intended to be inany way limiting. In addition to the illustrative aspects, embodiments,and features described above, further aspects, embodiments, and featureswill become apparent by reference to the drawings and the followingdetailed description.

BRIEF DESCRIPTION OF THE FIGURES

The foregoing and other features of this disclosure will become morefully apparent from the following description and appended claims, takenin conjunction with the accompanying drawings. Understanding that thesedrawings depict only several embodiments in accordance with thedisclosure and are, therefore, not to be considered limiting of itsscope, the disclosure will be described with additional specificity anddetail through use of the accompanying drawings, in which:

FIG. 1 illustrates an example system that can be utilized to implementweight generation in machine learning;

FIG. 2 depicts the example system of FIG. 1 with additional detailsrelated to a weight generation module;

FIG. 3 depicts a flow diagram for an example process to implement weightgeneration in machine learning;

FIG. 4 illustrates an example computer program product that can beutilized to implement weight generation in machine learning;

FIG. 5 is a block diagram illustrating an example computing device thatis arranged to implement weight generation in machine learning;

all arranged according to at least some embodiments described herein.

DETAILED DESCRIPTION

In the following detailed description, reference is made to theaccompanying drawings, which form a part hereof. In the drawings,similar symbols typically identify similar components, unless contextdictates otherwise. The illustrative embodiments described in thedetailed description, drawings, and claims are not meant to be limiting.Other embodiments may be utilized, and other changes may be made,without departing from the spirit or scope of the subject matterpresented herein. The aspects of the present disclosure, as generallydescribed herein, and illustrated in the drawings, can be arranged,substituted, combined, separated, and designed in a wide variety ofdifferent configurations.

This disclosure is generally drawn to, inter alia, methods, apparatus,systems, devices, and computer program products related to weightgeneration in machine learning.

Briefly stated, technologies are generally described for systems,devices and methods relating to a machine learning environment. In someexamples, a processor may identify a training distribution of a trainingdata. For example, the training data may include a set of points thatfollows a probability distribution, reflecting a probability of certaininputs or outputs. The processor may identify information about a testdistribution of a test data. The processor may identify a coordinate ofthe training data and the test data. The coordinate may be, for example,a number of movies rated by users. The processor may determine, for thecoordinate, differences between the test distribution and the trainingdistribution such as differences in the popularity of movies. Theprocessor may determine weights based on the differences. The weightsmay be adapted to cause the training distribution to conform to the testdistribution when the weights are applied to the training distribution.

FIG. 1 illustrates an example system 100 that can be utilized toimplement weight generation in machine learning, arranged according toat least some embodiments described herein. As depicted, system 100 mayinclude a computing device 102. Computing device 102 may include aprocessing module 104, a memory 106, a weight generation module 108, anda machine learning module 110, all configured to be in communicationwith one another. Processing module 104 may be hardware and may beconfigured to execute one or more instructions. For example, processingmodule 104 may be configured to execute one or more instructions storedin memory 106. Memory 106 may be further effective to store one or moremachine learning algorithms 114. Machine learning algorithms 114 mayinclude instructions and/or sets of instructions effective to produce afunction 116 when executed by machine learning module 110.

As will be discussed in further detail below, machine learning module110 may be effective to use one or more machine learning algorithms 114and training data 118 to generate or train function 116. An example offunction 116 may be a function to determine a credit score. In someexamples, training data 118 may include one or more points 130. Points130 may include sets of associated inputs 122 a and outputs 124 a. Forexample, an input with an income X and debt Y may result in a creditscore Z. In some examples, a training distribution of training data 118may be identified by processing module 104. In various other examples,processing module 104 may be effective to identify points 130 oftraining data 118. Training data 118 may be stored in memory 106. Points130 of training data 118 may follow a particular training distribution.For example, the training distribution may indicate a range of incomelevels at a first instance in time. In some examples, the trainingdistribution may be a probability distribution. Training data 118 may begenerated at an instance in time which may be prior to generation offunction 116. In some examples, function 116 may be effective todetermine outputs 124 b (such as, for example, determinations,classifications, predictions, and/or recommendations) based on inputs122 b of test data 120 provided to function 116. In some examples,outputs 124 b may be referred to as “labels.”

Test data 120 may include a number of points 131 which may follow aparticular test distribution. For example, the test distribution mayindicate a range of income levels at a second instance in time. In someexamples, test data 120 may be generated at an instance in time which islater than the instance in time at which training data 118 is generated.In some examples, the test distribution may be a probabilitydistribution. The test distribution of test data 120 may be differentfrom the training distribution of training data 118. In some examples,some information may be known about the test distribution of test data120 prior to the input of test data 120 into function 116. For example,publicly available information such as census data, may be accessed toindicate changes in income or population between training and test data.In some examples, information about a test distribution of test data 120may be identified by processing module 104. In some examples,information about a test distribution may include statistics such as amean and/or standard deviation of the test distribution. Additionally,information about the test distribution may include estimations ofprojections of the test distribution. For example, histograms of points131 along a coordinate may result in an estimate of the projection ofthe test distribution along the coordinate. Test data 120 and/orinformation about test data 120 may be stored in memory 106.

Weight generation module 108 may be effective to determine and/orcalculate weights 112 for each point 130 of training data 118. Weights112 may be applied to points 130 of training data 118 such that, afterapplication of weights 112, points 130 of training data 118 may follow aprobability distribution that resembles, matches, and/or conforms to theprobability distribution of test data 120. Weights 112 may be adapted tocause the training distribution to conform to the test distribution.Machine learning module 110 may receive weights 112 from weightgeneration module 108. Machine learning algorithms 114 may use weights112, and/or training data 118, to generate a weighted function 132.Weighted function 132 may be effective to determine outputs or labels124 c (such as, for example, determinations, classifications,predictions, and/or recommendations) based on the application of inputs122 c to weighted function 132. In some examples, some labels generatedby weighted function 132 may be different from labels generated byfunction 116, even where the same input values are applied to function116 and weighted function 132.

FIG. 2 depicts the example system 100 of FIG. 1 with additional detailsrelated to a weight generation module, arranged in accordance with atleast some embodiments described herein. FIG. 2 is substantially similarto system 100 of FIG. 1, with additional details. Those components inFIG. 2 that are labeled identically to components of FIG. 1 will not bedescribed again for the purposes of clarity and brevity.

In some examples, as is explained in more detail below, weightgeneration module 108 may receive training data 118 from memory 106, orfrom another source. Weight generation module 108 may identify and/orchoose one or more coordinates 210 (including, e.g., 210 ₁ . . . 210_(n)) of training data 118. Coordinates 210 may be, for example, one ormore parameters or dimensions of points 130. Each of coordinates 210 mayinclude a range of values in a coordinate space. A coordinate space maybe, for example, a Euclidean or other geometric space for a particularcoordinate 210. For example, if machine learning module 110 relates togeneration of a credit score, coordinate 210 may relate to income, debt,etc. Weight generation module 108 may divide the range of values of eachcoordinate space into one or more bins. Respective bins may definesubsets of the range of values for each coordinate. For example, weightgeneration module 108 may divide each identified and/or chosencoordinate 210 into one or more bins (such as, for example, “Bin 1”,“Bin 2”, “Bin 3”, etc.).

In further summarizing the detailed discussion below, weight generationmodule 108 may determine respective values for points 130 along eachidentified chosen coordinate 210. Weight generation module 108 maydetermine frequencies of a number of points 130, 131 located withinrespective bins, for respective coordinates 210. A frequency may be, forexample, a percentage of points 130 located within a particular bin,relative to the total number of points 130, for a particular coordinate210. Weights 112 may be chosen for each point 130 of training data 118,based on the frequency of points 130 located in each bin, for eachcoordinate, and based on information about points in test data 120.Machine learning module 110 may produce weighted function 132 based onweights 112 and/or training data 118.

Inputs 122 a of training data 118 may be vectors which may include oneor more parameters. In an example where machine learning algorithms 114are designed to produce a function to recommend movies to a user, someexample parameters of inputs 122 a may include an age of the user, anannual salary, a number of movies rated by the user, a location wherethe user lives, etc. Weight generation module 108 may choose one or moreof the parameters as coordinates 210 (including coordinates 210 ₁, 210₂, . . . , 210 _(n)). Weight generation module 108 may be effective toevaluate points 130 by examining each point of points 130 on acoordinate-by-coordinate basis. Each coordinate 210 may be divided intoa number of bins (such as “Bin 1”, “Bin 2”, “Bin 3”, etc.). In anexample where the chosen coordinate 210 is annual salary, Bin 1 mayrange from $0-$25,000, Bin 2 may range from $25,000-$50,000, Bin 3 mayrange from $50,000-$75,000, etc. Each point 130 may include a parameterrelated to annual salary. A location of each point of points 130 may bedetermined along the annual salary coordinate 210. For example, a firstpoint of points 130 may include an annual salary parameter value of$42,000. Accordingly, the first point may be located in Bin 2. Thenumber of points 130 located within each bin may be determined by weightgeneration module 108 and may be divided by the total number of points130 to produce a frequency for each bin. As will be described in furtherdetail below, weights 112 may be determined and/or calculated based ondifferences between frequencies calculated for the training distributionand frequencies calculated for the test distribution.

Weight generation module 108 may generate weights 112 using equation(1):

ω_(i)=1+Σ_(c=1) ^(C)μ_(c)(θ_(c)(i)) for i∈R  (1)

where ω_(i) may be a weight (e.g., one weight of weights 112) for aparticular point i among points 130. μ_(c) may quantify a differencebetween the number of points 131 in test data 120, in a particular bin(such as, for example, “Bin 1”, “Bin 2”, “Bin 3”, and/or “Bin 4”), andthe weighted sum of the number of points 130 in training data 118, inthe particular bin, divided by the number of points in training data118, in the particular bin. In some examples, μ_(c) may be calculatedfor each coordinate c (of coordinates 210). C may represent the totalnumber of identified and/or chosen coordinates 210. θ_(c) may be afunction that may determine in which bin a particular point i, of points130, falls.

Weight generation module 108 may determine the values for μ_(c) usingequation (2):

$\begin{matrix}{{\mu_{c}\left( \tau_{c} \right)} = {\frac{1}{n_{c}\left( \tau_{c} \right)}\left( {{N_{R}{v_{c}\left( \tau_{c} \right)}} - {n_{c}\left( \tau_{c} \right)} - {\sum\limits_{\underset{{\theta_{c}{(i)}} = \tau_{c}}{i \in R}}^{C}{\sum\limits_{\underset{k = 1}{k \neq c}}^{C}{\mu_{k}\left( \tau_{c} \right)}}}} \right)}} & (2)\end{matrix}$

n_(c) may be a vector representing the current count number of points130 in training data 118 in each of the bins, of a particular coordinatec, among coordinates 210. τ_(c) may represent the current count numberof bins for a particular coordinate C. N_(R) may represent the number ofpoints 130 in training data 118. υ_(c) may represent a frequency ofpoints in test data 120 appearing in a particular bin (such as, forexample, “Bin 1”, “Bin 2”, “Bin 3”, “Bin 4”, etc.), for a particularcoordinate c among coordinates 210, relative to the total number ofpoints 130.

An iterative process may be used by weight generation module 108 todetermine μ_(c). In some examples, all μ_(c) may be initialized to zeroor some other value. A first comparison value μ_(c)(τ_(c)) may beidentified for each bin of each coordinate using equation (2). Thecomputed value of μ_(c)(τ_(c)) may be plugged into equation (2)iteratively, to produce difference values. A value of μ_(c)(τ_(c)) maybe iteratively updated until a convergent value of μ_(c)(τ_(c)) isreached. The convergent value of μ_(c)(τ_(c)) may be used in equation(1) to produce weights 112 for each point 130 of training data 118. Insome examples, values of μ_(c)(τ_(c)) used while iterating equation (2)may be based on fractions of differences of values used in the previousiteration according to equation (3):

μ′_(new)=α*μ_(new)+(1−α)*μ_(old)  (3)

-   -   with α=0.1 or α=0.01        where μ_(u) may be the value of μ used during a previous        iteration of equation (2) to compute μ_(new). Equation (3) may        use μ_(new) and μ_(old) to calculate μ′_(new), which may be used        in subsequent iterations of equation (2). α may be a variable        used to control a degree to which new values of μ (e.g.,        μ′_(new)) depend upon previous values of μ (e.g., μ_(old)), when        iterating equation (2).

Among other potential benefits, weight generation in machine learningarranged in accordance with the present disclosure may allow forsimplified matching of test and training distributions to improve thepredictive capability of machine learning systems. Additionally, bychoosing a number of bins on a coordinate-by-coordinate basis, weightgeneration in machine learning in accordance with the present disclosuremay account for differences between training data sets and test datasets which result from the effects of a finite sample size. Changes maybe identified between training data and test data which may occur overtime as a result of changing opinions, trends, fashions, etc. In someexamples, taking such changes into account may result in machinelearning systems with better predictive ability. Recommendation systemsor predictions of time series like the stock market, may benefit fromthe described system.

FIG. 3 depicts a flow diagram for example process to implement weightgeneration in machine learning, arranged in accordance with at leastsome embodiments described herein. In some examples, the process in FIG.3 could be implemented using system 100 discussed above and could beused to generate weights for machine learning. An example process mayinclude one or more operations, actions, or functions as illustrated byone or more of blocks S2, S4, S6, S8 and/or S10, etc. Althoughillustrated as discrete blocks, various blocks may be divided intoadditional blocks, combined into fewer blocks, or eliminated, dependingon the particular implementation. Blocks may be supplemented withadditional blocks representing other operations, actions, or functions.The process in FIG. 3 may be used by a processor, such as processingmodule 104, or by a machine learning module, such as machine learningmodule 110, as described above.

Processing may begin at block S2, “Identify, by the processor, atraining distribution of a training data.” At block S2, a processor mayidentify a training distribution of training data.

Processing may continue from block S2 to block S4, “Identify, by theprocessor, information about a test distribution of a test data.” Atblock S4, the processor may identify information about a testdistribution of the test data. In an example, the training data may begenerated at a first instance in time and the test data may be generatedat a second instance in time. The second instance in time may be laterthan the first instance in time.

Processing may continue from block S4 to block S6, “Identify, by theprocessor, a coordinate of the training data and the test data.” Atblock S6, the processor may identify a coordinate of the training dataand the test data. In some examples, a range of values in coordinatespaces may be divided into a number of bins. For example, a coordinate210 may be divided into one or more bins, such as “Bin 1”, “Bin 2”, “Bin3”, etc., as depicted in FIG. 2.

Processing may continue from block S6 to block S8, “Determine, by theprocessor, for the coordinate, differences between the test distributionand the training distribution.” At block S8, the processor maydetermine, for the coordinate, differences between the test distributionand the training distribution.

Processing may continue from block S8 to block S10, “Determine, by theprocessor, weights based on the differences, the weights may be adaptedto cause the training distribution to conform to the test distributionwhen the weights are applied to the training distribution.” At blockS10, the processor may determine weights based on the differences. Theweights may be adapted to cause the training distribution to conform tothe test distribution when the weights are applied to the trainingdistribution. For example, weights and training data may be applied tomachine learning algorithm to generate weighted function. Test data maybe applied to the weighted function as an input. In an example, labelsmay be generated in response to application of the test data to theweighted function. In some examples, the labels may include at least oneof recommendations, classifications, predictions, and/or determinations.In some examples, determining the weights may include iterativelydetermining differences between the training distribution and the testdistribution. In some further examples, the weights may be determinedbased on a convergent value of the differences between the trainingdistribution and the test distribution. In some other examples,determining the weights may be further based on the number of the firstand second points which are located in bins. In another example, weightsmay be effective to conform a particular point in the trainingdistribution to a particular point in the test distribution.

FIG. 4 illustrates an example computer program product 400 that can beutilized to implement weight generation in machine learning, arranged inaccordance with at least some embodiments described herein. Programproduct 400 may include a signal bearing medium 402. Signal bearingmedium 402 may include one or more instructions 404 that, in response toexecution by, for example, a processor, may provide the functionalityand features described above with respect to FIGS. 1-3. Thus, forexample, referring to system 100, processing module 104 and/or machinelearning module 110 may undertake one or more of the blocks shown inFIG. 4 in response to instructions 404 conveyed to system 100 by medium402. In some examples, instructions 404 may be stored in a memory, suchas memory 106.

In some implementations, signal bearing medium 402 may encompass acomputer-readable medium 406, such as, but not limited to, a hard diskdrive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape,memory, etc. In some implementations, signal bearing medium 402 mayencompass a recordable medium 408, such as, but not limited to, memory,read/write (R/W) CDs, R/W DVDs, etc. In some implementations, signalbearing medium 402 may encompass a communications medium 410, such as,but not limited to, a digital and/or an analog communication medium(e.g., a fiber optic cable, a waveguide, a wired communications link, awireless communication link, etc.). Thus, for example, program product400 may be conveyed to one or more modules of the system 100 by an RFsignal bearing medium 402, where the signal bearing medium 402 isconveyed by a wireless communications medium 410 (e.g., a wirelesscommunications medium conforming with the IEEE 802.11 standard).

FIG. 5 is a block diagram illustrating an example computing device 500that is arranged to implement weight generation in machine learning,arranged in accordance with at least some embodiments described herein.In a very basic configuration 502, computing device 500 typicallyincludes one or more processors 504 (such as, for example, processingmodule 104) and a system memory 506 (such as, for example, memory 106).A memory bus 508 may be used for communicating between processor 504 andsystem memory 506.

Depending on the desired configuration, processor 504 may be of any typeincluding but not limited to a microprocessor (μP), a microcontroller(μC), a digital signal processor (DSP), or any combination thereof.Processor 504 may include one or more levels of caching, such as a levelone Cache 510 and a level two Cache 512, a processor Core 514, andregisters 516. An example processor core 514 may include an arithmeticlogic unit (ALU), a floating point unit (FPU), a digital signalprocessing core (DSP core), or any combination thereof. An examplememory controller 518 may also be used with processor 504, or in someimplementations memory controller 518 may be an internal part ofprocessor 504.

Depending on the desired configuration, system memory 506 may be of anyType including but not limited to volatile memory (such as RAM),non-volatile memory (such as ROM, flash memory, etc.) or any combinationthereof. System memory 506 may include an operating system 520, one ormore applications 522, and program data 524. Application 522 may includeweight generation in machine learning algorithm 526 that is arranged toperform the functions and operations as described herein including thosedescribed with respect to FIGS. 1-4 in connection with system 100.Program data 524 may include weight generation in machine learning data528 that may be useful to implement weight generation in machinelearning as is described herein. In some embodiments, application 522may be arranged to operate in cooperation with program data 524 and/oroperating system 520 such that weight generation in machine learning maybe provided. This described basic configuration 502 is illustrated inFIG. 5 by those components within the inner dashed line.

Computing device 500 may have additional features or functionality, andadditional interfaces to facilitate communications between basicconfiguration 502 and any required devices and interfaces. For example,a bus/interface controller 530 may be used to facilitate communicationsbetween basic configuration 502 and one or more data storage devices 532via a storage interface bus 534. Data storage devices 532 may beremovable storage devices 536, non-removable storage devices 538, or acombination thereof. Examples of removable storage and non-removablestorage devices include magnetic disk devices such as flexible diskdrives and hard-disk drives (HDDs), optical disk drives such as compactdisc (CD) drives or digital versatile disk (DVDs) drives, solid statedrives (SSDs), and tape drives to name a few. Example computer storagemedia may include volatile and nonvolatile, removable and non-removablemedia implemented in any method or technology for storage ofinformation, such as computer readable instructions, data structures,program modules, or other data.

System memory 506, removable storage devices 536 and non-removablestorage devices 538 are examples of computer storage media. Computerstorage media includes, but is not limited to, RAM, ROM, EEPROM, flashmemory or other memory technology, CD-ROM, digital versatile disks(DVDs) or other optical storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or any othermedium which may be used to store the desired information and which maybe accessed by computing device 500. Any such computer storage media maybe part of computing device 500.

Computing device 500 may also include an interface bus 540 forfacilitating communication from various interface devices (e.g., outputdevices 542, peripheral interfaces 544, and communication devices 546)to basic configuration 502 via bus/interface controller 530. Exampleoutput devices 542 include a graphics processing unit 548 and an audioprocessing unit 550, which may be configured to communicate to variousexternal devices such as a display or speakers via one or more A/V ports552. Example peripheral interfaces 544 include a serial interfacecontroller 554 or a parallel interface controller 556, which may beconfigured to communicate with external devices such as input devices(e.g., keyboard, mouse, pen, voice input device, touch input device,etc.) or other peripheral devices (e.g., printer, scanner, etc.) via oneor more I/O ports 558. An example communication device 546 includes anetwork controller 560, which may be arranged to facilitatecommunications with one or more other computing devices 562 over anetwork communication link via one or more communication ports 564.

The network communication link may be one example of a communicationmedia. Communication media may typically be embodied by computerreadable instructions, data structures, program modules, or other datain a modulated data signal, such as a carrier wave or other transportmechanism, and may include any information delivery media. A “modulateddata signal” may be a signal that has one or more of its characteristicsset or changed in such a manner as to encode information in the signal.By way of example, and not limitation, communication media may includewired media such as a wired network or direct-wired connection, andwireless media such as acoustic, radio frequency (RF), microwave,infrared (IR) and other wireless media. The term computer readable mediaas used herein may include both storage media and communication media.

Computing device 500 may be implemented as a portion of a small-formfactor portable (or mobile) electronic device such as a cell phone, apersonal data assistant (PDA), a personal media player device, awireless web-watch device, a personal headset device, an applicationspecific device, or a hybrid device that include any of the abovefunctions. Computing device 500 may also be implemented as a personalcomputer including both laptop computer and non-laptop computerconfigurations.

The present disclosure is not to be limited in terms of the particularembodiments described in this application, which are intended asillustrations of various aspects. Many modifications and variations canbe made without departing from its spirit and scope. Functionallyequivalent methods and apparatuses within the scope of the disclosure,in addition to those enumerated herein, will be apparent from theforegoing descriptions. Such modifications and variations are intendedto fall within the scope of the appended claims. The present disclosureis to be limited only by the terms of the appended claims, along withthe full scope of equivalents to which such claims are entitled. It isto be understood that this disclosure is not limited to particularmethods, reagents, compounds compositions or biological systems, whichcan, of course, vary. It is also to be understood that the terminologyused herein is for the purpose of describing particular embodimentsonly, and is not intended to be limiting.

With respect to the use of substantially any plural and/or singularterms herein, those having skill in the art can translate from theplural to the singular and/or from the singular to the plural as isappropriate to the context and/or application. The varioussingular/plural permutations may be expressly set forth herein for sakeof clarity.

In general, terms used herein, and especially in the appended claims(e.g., bodies of the appended claims) are generally intended as “open”terms (e.g., the term “including” should be interpreted as “includingbut not limited to,” the term “having” should be interpreted as “havingat least,” the term “includes” should be interpreted as “includes but isnot limited to,” etc.). If a specific number of an introduced claimrecitation is intended, such an intent will be explicitly recited in theclaim, and in the absence of such recitation no such intent is present.For example, as an aid to understanding, the following appended claimsmay contain usage of the introductory phrases “at least one” and “one ormore” to introduce claim recitations. However, the use of such phrasesshould not be construed to imply that the introduction of a claimrecitation by the indefinite articles “a” or “an” limits any particularclaim containing such introduced claim recitation to embodimentscontaining only one such recitation, even when the same claim includesthe introductory phrases “one or more” or “at least one” and indefinitearticles such as “a” or “an” (e.g., “a” and/or “an” should beinterpreted to mean “at least one” or “one or more”); the same holdstrue for the use of definite articles used to introduce claimrecitations. In addition, even if a specific number of an introducedclaim recitation is explicitly recited, those skilled in the art willrecognize that such recitation should be interpreted to mean at leastthe recited number (e.g., the bare recitation of “two recitations,”without other modifiers, means at least two recitations, or two or morerecitations). Furthermore, in those instances where a conventionanalogous to “at least one of A, B, and C, etc.” is used, in generalsuch a construction is intended in the sense one having skill in the artwould understand the convention (e.g., “a system having at least one ofA, B, and C” would include but not be limited to systems that have Aalone, B alone, C alone, A and B together, A and C together, B and Ctogether, and/or A, B, and C together, etc.). In those instances where aconvention analogous to “at least one of A, B, or C, etc.” is used, ingeneral such a construction is intended in the sense one having skill inthe art would understand the convention (e.g., “a system having at leastone of A, B, or C” would include but not be limited to systems that haveA alone, B alone, C alone, A and B together, A and C together, B and Ctogether, and/or A, B, and C together, etc.). It will be furtherunderstood by those within the art that virtually any disjunctive wordand/or phrase presenting two or more alternative terms, whether in thedescription, claims, or drawings, should be understood to contemplatethe possibilities of including one of the terms, either of the terms, orboth terms. For example, the phrase “A or B” will be understood toinclude the possibilities of “A” or “B” or “A and B.”

For any and all purposes, such as in terms of providing a writtendescription, all ranges disclosed herein also encompass any and allpossible subranges and combinations of subranges thereof. Any listedrange can be easily recognized as sufficiently describing and enablingthe same range being broken down into at least equal halves, thirds,quarters, fifths, tenths, etc. As a non-limiting example, each rangediscussed herein can be readily broken down into a lower third, middlethird and upper third, etc. As will also be understood by one skilled inthe art all language such as “up to,” “at least,” “greater than,” “lessthan,” and the like include the number recited and refer to ranges whichcan be subsequently broken down into subranges as discussed above.Finally, a range includes each individual member. Thus, for example, agroup having 1-3 Cells refers to groups having 1, 2, or 3 Cells.Similarly, a group having 1-5 Cells refers to groups having 1, 2, 3, 4,or 5 Cells, and so forth.

While various aspects and embodiments have been disclosed herein, otheraspects and embodiments are possible. The various aspects andembodiments disclosed herein are for purposes of illustration and arenot intended to be limiting, with the true scope and spirit beingindicated by the following claims.

What is claimed is:
 1. A method to improve predictive capability of amachine learning system, the method comprising: obtaining, by aprocessor of a computer, training data stored in a memory; identifying,by the processor, a training distribution of the training data;obtaining, by the processor, test data stored in the memory;identifying, by the processor, information about a test distribution ofthe test data; identifying, by the processor, coordinates of thetraining data and the test data; determining, by the processor, for eachidentified coordinate, differences between the test data and thetraining data; determining, by the processor, weights based on thedetermined differences, wherein the weights are adapted to cause thetraining distribution to match to the test distribution in response tothe weights being applied to the training distribution; generating, bythe processor, a function based on the training distribution and theweights; and generating labels, in response to an application of thetest data to the function, wherein generating labels in response to theapplication of the test data to the function facilitates matching of thetest distribution and the training distribution to improve a predictivecapability of the machine learning system.
 2. The method of claim 1,wherein the training distribution of the training data includes aprobability distribution.
 3. The method of claim 1, wherein: thetraining data is generated at a first instance in time; and the functionis generated at a second instance in time, wherein the second instancein time is later than the first instance in time.
 4. The method of claim1, wherein the information about the test distribution of data comprisesone of a mean, standard deviation, and projection estimation of the testdistribution.
 5. The method of claim 1, wherein the function iseffective to determine labels as one or more of: determinations,recommendations, classifications, and predictions.
 6. The method ofclaim 1, wherein: determining the weights comprises iterativelydetermining differences between the training data and the test data, andthe weights are determined based on a convergent value of the determineddifferences between the training data and the test data.
 7. The methodof claim 1, wherein: the training data includes a number of points, thecoordinates include a range of values in a coordinate space, and themethod further comprises: dividing the range of values in the coordinatespace into bins; and wherein determining the weights is further based onthe number of points and the number of the bins.
 8. A method to improvepredictive capability of a machine learning system, the methodcomprising, by a computer: identifying a training distribution of firstpoints of training data; identifying information about a testdistribution of second points of test data; determining a firstfrequency for the training distribution of the first points; determininga second frequency for the test distribution of the second points;comparing the first frequency and the second frequency; calculating aweight based on differences between the first frequency and the secondfrequency; generating a weighted function based on the trainingdistribution and the calculated weight; generating labels, in responseto an application of the test data to the weighted function; wherein theweighted function facilitates a match of a particular point in thetraining distribution to a particular point in the test distribution,and wherein the match improves the predictive capability of the machinelearning system.
 9. The method of claim 8, wherein the test distributionincludes a probability distribution.
 10. The method of claim 8, furthercomprising: identifying one or more coordinates of each of the firstpoints and the second points; dividing the identified one or morecoordinates into one or more bins.
 11. The method of claim 10, wherein:the first frequency relates to a first percentage of the first pointsthat are located within a particular bin, and the second frequencyrelates to a second percentage of the second points that are locatedwithin the particular bin.
 12. The method of claim 8, wherein each ofthe first points of training data is evaluated by examining each pointof the first points of the training data on a coordinate-by-coordinatebasis.
 13. The method of claim 10, wherein comparing the first frequencyand the second frequency includes: identifying a first comparison value;comparing frequency values of the test data and the training data in theone or more bins to produce a difference value; updating the firstcomparison value to produce a second comparison value that is based onthe difference value; and iteratively repeating the identifying thefirst comparison value, the comparing the frequency values of the testdata and the training data in the one or more bins to produce thedifference value, and the updating the first comparison value to producethe second comparison value that is based on the difference value, untilthe second comparison value converges to a convergent value.
 14. Themethod of claim 12, wherein updating the first comparison value toproduce the second comparison value that is based on the differencevalue comprises: adding a fraction of the difference value to the firstcomparison value to produce the second comparison value.
 15. A computingdevice, comprising: a first processor; a second processor; and a memoryconfigured to be in communication with the first processor and thesecond processor, wherein the memory is effective to store training dataand test data, wherein the first processor is effective to: retrievetraining data stored in the memory; identify a training distribution ofthe training data; retrieve test data from the memory; identifyinformation about a test distribution of the test data; identify one ormore coordinates of the training data and the test data determine afirst frequency, wherein the first frequency relates to the trainingdata; determine a second frequency, wherein the second frequency relatesto the test data; compare the first frequency and the second frequency;and determine a weight for the training data, wherein the determinationof the weight is based at least, in part, on the comparison of the firstfrequency and the second frequency, and wherein the second processor iseffective to: generate a weighted function based on the determinedweight and the training data; and operate the computing device togenerate labels, in response to an application of the test data to theweighted function, wherein the generation of the labels in response tothe application of the test data to the weighted function facilitates amatch of the test distribution and the training distribution to improvea predictive capability of the computing device.
 16. The device of claim15, wherein the one or more coordinates includes a coordinate having arange of values in a coordinate space.
 17. The device of claim 15,wherein the training data comprises first points and the test datacomprises second points.
 18. The device of claim 17, wherein: the firstpoints follow the training distribution, the second points follow thetest distribution, and the weight is effective to match a particularpoint in the training distribution to a particular point in the testdistribution.
 19. The computing device of claim 16, wherein: the firstprocessor is further effective to divide the one or more coordinatesinto bins, and the bins define subsets of the range of values in thecoordinate space.
 20. The computing device of claim 19, wherein thefirst processor is further effective to: identify a first comparisonvalue; compare frequency values of the test data and the training datain the bins to produce a difference value; update the first comparisonvalue to produce a second comparison value that is based on thedifference value; iteratively repeat the identification of the firstcomparison value, the comparison of frequency values of the test dataand the training data in the bins to produce the difference value, andthe update of the first comparison value to produce the secondcomparison value that is based on the difference value, until the secondcomparison value converges to a convergent value; and store theconverged second comparison value in the memory.
 21. The computingdevice of claim 19, wherein the first processor is further effective toupdate the first comparison value to produce the second comparison valuethat is based on the difference value, by addition of a fraction of thedifference value to the first comparison value.
 22. The computing deviceof claim 15, wherein the second processor is further effective to: storethe weighted function in the memory.
 23. The device of claim 19, whereinthe first processor is further effective to determine a weight for thetraining data, based on a number of first and second points located in aparticular bin and based on a number of the bins.